Investigating sentence weighting components for automatic summarisation
نویسندگان
چکیده
The work described here initially formed part of a triangulation exercise to establish the effectiveness of the Query Term Order algorithm. The methodology produced subsequently proved to be a reliable indicator of quality for summarising English web documents. We utilised the human summaries from the Document Understanding Conference data, and generated queries automatically for testing the QTO algorithm. Six sentence weighting schemes that made use of Query Term Frequency and QTO were constructed to produce system summaries, and this paper explains the process of combining and balancing the weighting components. We also examined the five automatically generated query terms in their different permutations to check if the automatic generation of query terms resulting bias. The summaries produced were evaluated by the ROUGE-1 metric, and the results showed that using QTO in a weighting combination resulted in the best performance. We also found that using a combination of more weighting components always produced improved performance compared to any single weighting component.
منابع مشابه
The HOLJ Corpus: Supporting Summarisation Of Legal Texts
We describe an XML-encoded corpus of texts in the legal domain which was gathered for an automatic summarisation project. We describe two distinct layers of annotation: manual annotation of the rhetorical status of sentences and an entirely automatic annotation process incorporating a host of individual linguistic processors. The manual rhetorical status annotation has been developed as trainin...
متن کاملAn Approach for Query-Focused Text Summarisation for Evidence Based Medicine
We present an approach for extractive, query-focused, singledocument summarisation of medical text. Our approach utilises a combination of target-sentence-specific and target-sentence-independent statistics derived from a corpus specialised for summarisation in the medical domain. We incorporate domain knowledge via the application of multiple domain-specific features, and we customise the answ...
متن کاملSummarising text with a genetic algorithm-based sentence extraction
Automatic text summarisation has long been studied and used. The growth in the amount of information on the web results in more demands for automatic methods for text summarisation. Designing a system to produce human-quality summaries is difficult and therefore, many researchers have focused on sentence or paragraph extraction, which is a kind of summarisation. In this paper, we introduce a ne...
متن کاملThe influence of personal pronouns for automatic summarisation of scientific articles
In automatic summarisation, statistical methods based on tokens’ frequency are commonly used in combination with other methods or on their own to extract important sentences from a text. Quite often researchers justify the relatively poor performance of these statistical methods by the fact that they do not consider the anaphoric relations between words. In this paper, we perform a comprehensiv...
متن کاملOpinion-aware information management : statistical summarisation and knowledge representation of opinions
Nowadays, an increasing amount of media platforms provide the users with opportunities for sharing their opinions about products, companies or people. In order to support users accessing opinion-based information, and to support engineers building systems that require opinionaware reasoning, intelligent opinion-aware tools and techniques are needed. This thesis contributes methods and technolog...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Inf. Process. Manage.
دوره 43 شماره
صفحات -
تاریخ انتشار 2007